北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2010, Vol. 33 ›› Issue (4): 44-48.doi: 10.13190/jbupt.201004.44.zhangjp

• 论文 • 上一篇    下一篇

面向高速数据流的偏倚抽样集合分类器

杨显飞,杨静,张健沛   

  1. 哈尔滨工程大学
  • 收稿日期:2009-10-10 修回日期:2010-01-14 出版日期:2010-08-28 发布日期:2010-05-21
  • 通讯作者: 杨显飞 E-mail:yangxianfei@eyou.com
  • 基金资助:

    国家级.国家自然科学基金重大研究计划项目

Ensemble Classifiers Research for Classify High Speed Data Stream Based on Biased Sample

  • Received:2009-10-10 Revised:2010-01-14 Online:2010-08-28 Published:2010-05-21

摘要:

针对高速数据流的流速超过集合分类器的处理能力,集合分类器无法训练全部最近到达的数据以更新分类器模型的问题,提出一种偏倚抽样集合分类器算法. 通过偏差方差分解方法分析集合分类器的期望错误,利用计算待抽样数据的期望错误贡献度,实现数据的偏倚抽样,有效缩减了集合分类器的训练更新时间. 与随机抽样集合分类器方法进行了比较. 理论分析和实验结果表明,在抽样比例相同的条件下,该方法可以有效提高集合分类器的分类准确率.

关键词: 数据流, 集合分类器, 偏倚抽样, 偏差方差分解

Abstract:

High speed data stream brings the phenomenon that the data rate is higher relative to the ensemble classifiers computational power, so the ensemble classifiers cant train all data which reached recently to update themselves. An ensemble classifiers is proposed based on biased sample. Expectation error is analyzed through biased variance decomposition method, and the data is also biased sampled by computing all datas expectation error contribution degree which is waited for being sampled. This method can reduce time to train and update ensemble classifiers and will be contrasted with random sample ensemble classifiers. It indicates that this method has more prediction accuracy on condition the same proportion of sample.

Key words: data stream, ensemble classifiers, biased sample, bias variance decomposition